Transformational Priors Over Grammars

نویسنده

  • Jason Eisner
چکیده

This paper proposes a novel class of PCFG parameterizations that support linguistically reasonable priors over PCFGs. To estimate the parameters is to discover a notion of relatedness among context-free rules such that related rules tend to have related probabilities. The prior favors grammars in which the relationships are simple to describe and have few major exceptions. A basic version that bases relatedness on weighted edit distance yields superior smoothing of grammars learned from the Penn Treebank (20% reduction of rule perplexity over the best previous method). 1 A Sketch of the Concrete Problem This paper uses a new kind of statistical model to smooth the probabilities of PCFG rules. It focuses on “flat” or “dependency-style” rules. These resemble subcategorization frames, but include adjuncts as well as arguments. The verb put typically generates 3 dependents—a subject NP at left, and an object NP and goal PP at right: • S→ NP put NP PP: Jim put [the pizza] [in the oven] But put may also take other dependents, in other rules: • S→ NP Adv put NP PP: Jim often put [a pizza] [in the oven] • S→ NP put NP PP PP: Jim put soup [in an oven] [at home] • S→ NP put NP: Jim put [some shares of IBM stock] • S→ NP put Prt NP: Jim put away [the sauce] • S→ TO put NP PP: to put [the pizza] [in the oven] • S→ NP put NP PP SBAR: Jim put it [to me] [that . . . ] These other rules arise if put can add, drop, reorder, or retype its dependents. These edit operations on rules are semantically motivated and quite common (Table 1). We wish to learn contextual probabilities for the edit operations, based on an observed sample of flat rules. In English we should discover, for example, that it is quite common to add or delete PP at the right edge of a rule. These contextual edit probabilities will help us guess the true probabilities of novel or little-observed rules. However, rules are often idiosyncratic. Our smoothing method should not keep us from noticing (given enough evidence) that put takes a PP more often than most verbs. Hence this paper’s proposal is a Bayesian smoothing method that allows idiosyncrasy in the grammar while presuming regularity to be more likely a priori. The model will assign a positive probability to each of the infinitely many formally possible rules. The following bizarre rule is not observed in training, and seems very unlikely. But there is no formal reason to rule it out, and it might help us parse an unlikely test sentence. So the model will allow it some tiny probability: • S→ NP Adv PP put PP PP PP NP AdjP S 2 Background and Other Approaches A PCFG is a conditional probability function p(RHS | LHS).1 For example, p(V NP PP | VP) gives the probability of the rule VP→ V NP PP. With lexicalized nonterminals, it has form p(Vput NPpizza PPin | VPput). Usually one makes an independence assumption and defines this as p(Vput NP PP | VPput) times factors that choose dependent headwords pizza and in according to the selectional preferences of put. This paper is about estimating the first factor, p(Vput NP PP | VPput). In supervised learning, it is simplest to use a maximum likelihood estimate (perhaps with backoff from put). Charniak (1997) calls this a “Treebank grammar” and gambles that assigning 0 probability to rules unseen in training data will not hurt parsing accuracy too much. However, there are four reasons not to use a Treebank grammar. First, ignoring unseen rules necessarily sacrifices some accuracy. Second, we will show that it improves accuracy to flatten the parse trees and use flat, dependency-style rules like p(NP put NP PP | Sput); this avoids overly strong independence assumptions, but it increases the number of unseen rules and so makes Treebank grammars less tenable. Third, backing off from the word is a crude technique that does not distinguish among words.2 Fourth, one would eventually like to reduce or eliminate supervision, and then generalization is important to constrain the search to reasonable grammars. To smooth the distribution p(RHS | LHS), one can define it in terms of a set of parameters and then estimate those parameters. Most researchers have used an n-gram model (Eisner, 1996; Charniak, 2000) or more general Markov model (Alshawi, 1996) to model the sequence of nonterminals in the RHS. The sequence Vput NP PP in our example is then assumed to be emitted by some Markov model of VPput rules (again with backoff from put). Collins (1997, model 2) uses a more sophisticated model in which all arguments in this sequence are generated jointly, as in a Treebank grammar, and then a Markov process is used to insert adjuncts among the arguments. While Treebank models overfit the training data, Markov models underfit. A simple compromise (novel to this paper) is a hybrid Treebank/Markov model, which backs off from a Treebank model to a Markov. Like this paper’s main proposal, it can learn well-observed idiosyncratic rules but generalizes when data are sparse.3 Nonstandardly, this allows infinitely many rules with p>0. One might do better by backing off to word clusters, which Charniak (1997) did find provided a small benefit. Carroll and Rooth (1998) used a similar hybrid technique Association for Computational Linguistics. Language Processing (EMNLP), Philadelphia, July 2002, pp. 63-70. Proceedings of the Conference on Empirical Methods in Natural

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting auxiliary distributions in stochastic uni

This paper describes a method for estimating conditional probability distributions over the parses of uniication-basedd grammars which can utilize auxiliary distributions that are estimated by other means. We show how this can be used to incorporate information about lexical selectional preferences gathered from other sources into Stochastic Uniication-basedd Grammars (SUBGs). While we apply th...

متن کامل

A Class of Transformational Recognition Grammars

1. PETmCK considers transformational grammars (T-grammars) of a special* form which essentially have the properties described by N. CHOMSKY (1965). a) The base grammar is context-free. One recursive element S is distinguished. The base trees in general have the form s s or in linear notation Consequently each base tree consists of a finite set of subtrees or kernel trees each of the form S(@ $1...

متن کامل

On Restrictions on Transformational Grammars Reducing the Generative Power

Various restrictions on transformational grammars have been investigated in order to reduce their generative power from recursively enumerable languages to recursive languages. It will be shown that any restriction on transformational grammars defining a recursively enumerable subset of the set of all transformational grammars, is either too weak (in the sense that there does not exist a genera...

متن کامل

On the Generative Power of

Mathematical modeling of phrase structure grammars has yielded many results of benefit to linguists in their investigation of these grammars, such as Chomsky’s characterization in terms of self-embedding of those context-free languages which are not regular. The recent shift of focus in linguistic theory to transformational grammars has not been accompanied by a similar application of mathemati...

متن کامل

Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for tha...

متن کامل

Obling: a Tester for Transformational Grammars

Transformational grammars have developed with recent research in linguistics. They appear to be a powerful and explicit device for characterizing the description of sentences; they also meet conditions of adequacy that can be applied to check that a sentence, or a set of rules, is well-formed. A transformational grammar tester is part of a strategy for the selection of a well-formed grammar mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002